Combining Opponent Modeling and Model-Based Reinforcement Learning in a Two-Player Competitive Game
نویسنده
چکیده
When an opponent with a stationary and stochastic policy is encountered in a twoplayer competitive game, model-free Reinforcement Learning (RL) techniques such as Q-learning and Sarsa(λ) can be used to learn near-optimal counter strategies given enough time. When an agent has learned such counter strategies against multiple diverse opponents, it is not trivial to decide which one to use when a new unidentified opponent is encountered. Opponent modeling provides a sound method for accomplishing this in the case where a policy has already been learned against the new opponent; the policy corresponding to the most likely opponent model can be employed. When a new opponent has never been encountered previously, an appropriate policy may not be available. The proposed solution is to use model-based RL methods in conjunction with separate environment and opponent models. The model-based RL algorithms used were Dyna-Q and value iteration (VI). The environment model allows an agent to reuse general knowledge about the game that is not tied to a specific opponent. Opponent models that are evaluated include Markov chains, Mixtures of Markov chains, and Latent Dirichlet Allocation on Markov chains. The latter two models are latent variable models, which make predictions for new opponents by estimating their latent (unobserved) parameters. In some situations, I have found that this allows good predictive models to be learned quickly for new opponents given data from previous opponents. I show cases where these models have low predictive perplexity (high accuracy) for novel opponents. In theory, these opponent models would enable modelbased RL agents to learn best response strategies in conjunction with an environment model, but converting prediction accuracy to actual game performance is non-trivial. This was not achieved with these methods for the domain, which is a two-player soccer game based on a physics simulation. Model-based RL did allow for faster learning in the game, but did not take full advantage of the opponent models. The quality of the environment model seems to be a critical factor in this situation.
منابع مشابه
Robust Opponent Modeling in Real-Time Strategy Games using Bayesian Networks
Opponent modeling is a key challenge in Real-Time Strategy (RTS) games as the environment is adversarial in these games, and the player cannot predict the future actions of her opponent. Additionally, the environment is partially observable due to the fog of war. In this paper, we propose an opponent model which is robust to the observation noise existing due to the fog of war. In order to cope...
متن کاملAn Adaptive Learning Game for Autistic Children using Reinforcement Learning and Fuzzy Logic
This paper, presents an adapted serious game for rating social ability in children with autism spectrum disorder (ASD). The required measurements are obtained by challenges of the proposed serious game. The proposed serious game uses reinforcement learning concepts for being adaptive. It is based on fuzzy logic to evaluate the social ability level of the children with ASD. The game adapts itsel...
متن کاملMODELING RISK OF LOSING A CUSTOMER IN A TWO-ECHELON SUPPLY CHAIN FACING AN INTEGRATED COMPETITOR: A GAME THEORY APPROACH
In a competitive market, customer decision is made to maximize his utility. It can be assumed that risk of losing a supply chain’s customer can be defined based on products utility from customer point of view. This paper takes account of product price and service level as competition criteria. The proposed model is based on non-cooperative game theory, for one-manufacturer and one-retailer supp...
متن کاملUsing iterated reasoning to predict opponent strategies
The field of multiagent decision making is extending its tools from classical game theory by embracing reinforcement learning, statistical analysis, and opponent modeling. For example, behavioral economists conclude from experimental results that people act according to levels of reasoning that form a “cognitive hierarchy” of strategies, rather than merely following the hyper-rational Nash equi...
متن کاملHyper-Rational Choice and Economic Behaviour
In this paper, with help of the concept of hyper-rationality, we model the interaction between two investment companies by an important game as trickery game that has special equilibrium which called hyper-equilibrium. In trickery game, one company can choose cooperation with another company until the last moment and finally changes his action to non-cooperation which incur more loss to an oppo...
متن کامل